Store raw path bytes in Diff instances #474
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, the following fields on Diff instances were assumed to be passed in as unicode strings:
a_path
b_path
rename_from
rename_to
However, since Git natively records paths as bytes, these may potentially not have a valid unicode representation.
This patch changes the Diff instance to instead take the following equivalent fields that should be raw bytes instead:
a_rawpath
b_rawpath
raw_rename_from
raw_rename_to
NOTE ON BACKWARD COMPATIBILITY:
The original
a_path
,b_path
, etc. fields are still available as properties (rather than slots). These properties now dynamically decode the raw bytes into a unicode string (performing the potentiallydestructive operation of replacing invalid unicode chars by "�"'s).
This means that all code using Diffs should remain backward compatible. The only exception is when people would manually construct Diff instances by calling the constructor directly, in which case they should now pass in bytes rather than unicode strings.
See also the discussion on #467